本文介绍了有关如何架构,设计和优化深神经网络(DNN)的最新概述,以提高性能并保留准确性。该论文涵盖了一组跨越整个机器学习处理管道的优化。我们介绍两种类型的优化。第一个改变了DNN模型,需要重新训练,而第二个则不训练。我们专注于GPU优化,但我们认为提供的技术可以与其他AI推理平台一起使用。为了展示DNN模型优化,我们在流行的Edge AI推理平台(Nvidia Jetson Agx Xavier)上改善了光流的最先进的深层网络体系结构之一,RAFT ARXIV:2003.12039。
translated by 谷歌翻译
这项研究旨在使用人工智能(AI)和多视图图像实现更可靠的自动化后建筑物损害分类。当前的实践和研究工作在采用AI进行灾后损害评估的AI方面通常是(a)定性,基于标准损害量表缺乏建筑物损害水平的精制分类,并且(b)基于空中或卫星图像培训,具有有限的视图,视图有限,尽管有指示性,但并不完全描述损伤量表。为了使损伤水平的更准确和可靠的自动量化量化,本研究提出了以多种地面和建筑物的空中视图形式使用更全面的视觉数据。为了具有这样的空间感知的损害预测模型,使用了多视图卷积神经网络(MV-CNN)体系结构,结合了损坏建筑物不同视图的信息。这种空间3D上下文损害信息将导致更准确地识别损害和可靠的损害水平量化。拟议的模型经过训练和验证,并在侦察视觉数据集上进行了验证,其中包含飓风哈维后检查的建筑物的专家标签,地理标记的图像。开发的模型在预测损害水平方面表现出合理的准确性,可用于支持更加知识和可靠的AI-AI-AS辅助灾害管理实践。
translated by 谷歌翻译
最近的深度学习应用程序的成功与这些广泛可用的强大的计算资源互动,用于培训具有巨大数据集的复杂机器学习模型。尽管如此,使用模型并行性(与数据并行性相反)培训诸如卷积神经网络的大型模型是具有挑战性的,因为模型碎片之间的通信的复杂性使得难以在具有可接受的权衡的多台机器上有效地分区计算。本文介绍了SplitBrain,高性能分布式深度学习框架支持混合数据和模型并行性。具体而言,拆分提供了专用的分区,其共同定位计算密集型卷积层,同时分析记忆苛刻的层。提出了一种新颖的可扩展组沟通,以进一步提高培训吞吐量,通过减少通信开销。结果表明,Splitbrain可以实现几乎线性的加速,同时节省高达67 \%的数据和模型平行VGG通过CIFAR-10的内存消耗。
translated by 谷歌翻译
组活动识别(GAR)检测由短视频剪辑中的一组演员执行的活动。任务需要对场景实体的组成理解和它们之间的关系推理。我们通过将视频建模为一系列令牌来致电GAR,该令牌代表视频中的多尺度语义概念。我们提出了Composer,一种基于多尺度变压器的架构,其在每个规模上通过令牌进行关注的推理,并在合成方面学习群组活动。此外,我们只使用缩小场景偏差的关键点模态并提高模型的泛化能力。我们通过群集中间尺度表示来提高作曲家中的多尺度表示,同时在尺度之间保持一致的群集分配。最后,我们使用辅助预测和新型数据增强(例如,演员丢弃)等技术来帮助模型培训。我们展示了挑战排球数据集的模型的实力和可解释性。作曲家通过Keypoint的模型实现新的最先进的94.5%的准确性。作曲家优于依赖RGB信号的最新GAR方法,并对利用多种方式的方法进行比较。我们的代码将可用。
translated by 谷歌翻译
有条件的生成对抗网络(CGANs)将标准无条件GaN框架扩展到学习样本的联合数据标签分布,并已建立为能够产生高保真图像的强大生成模型。这种模型的训练挑战在于将课程信息恰当地注入到其发电机和鉴别器中。对于鉴别器,可以通过(1)直接将标签作为输入或(2)涉及辅助分类损失的标签来实现类调节。在本文中,我们表明前者直接对齐类条件的假和实际数据分布$ p(\ text {image} | \ text {class})$({\ EM数据匹配}),而后者对齐数据调节类分布$ p(\ text {class} | \ text {image})$({\ EM标签匹配})。虽然类别可分离性并不直接转化为样本质量,并且如果分类本身是本质上困难的话,如果不同类别的特征映射到同一点,则不能为发电机提供有用的指导,因此可以为同一点映射并因此变得不可分割。通过这种直觉激励,我们提出了一种双重投影GaN(P2Gan)模型,它学会在{\ EM数据匹配}和{\ EM标签匹配}之间平衡。然后,我们提出了一种改进的Cgan模型,通过辅助分类,通过最大限度地减少$ F $ -divergence,通过辅助分类直接对准假和实际条件$ p(\ text {class} | \ text {image})$。高斯(MOG)数据集的合成混合物和各种现实世界数据集的实验,包括CIFAR100,ImageNet和Vggface2,证明了我们所提出的模型的功效。
translated by 谷歌翻译
Artificial intelligence (AI) and robotic coaches promise the improved engagement of patients on rehabilitation exercises through social interaction. While previous work explored the potential of automatically monitoring exercises for AI and robotic coaches, the deployment of these systems remains a challenge. Previous work described the lack of involving stakeholders to design such functionalities as one of the major causes. In this paper, we present our efforts on eliciting the detailed design specifications on how AI and robotic coaches could interact with and guide patient's exercises in an effective and acceptable way with four therapists and five post-stroke survivors. Through iterative questionnaires and interviews, we found that both post-stroke survivors and therapists appreciated the potential benefits of AI and robotic coaches to achieve more systematic management and improve their self-efficacy and motivation on rehabilitation therapy. In addition, our evaluation sheds light on several practical concerns (e.g. a possible difficulty with the interaction for people with cognitive impairment, system failures, etc.). We discuss the value of early involvement of stakeholders and interactive techniques that complement system failures, but also support a personalized therapy session for the better deployment of AI and robotic exercise coaches.
translated by 谷歌翻译
The success of CNNs in various applications is accompanied by a significant increase in the computation and parameter storage costs. Recent efforts toward reducing these overheads involve pruning and compressing the weights of various layers without hurting original accuracy. However, magnitude-based pruning of weights reduces a significant number of parameters from the fully connected layers and may not adequately reduce the computation costs in the convolutional layers due to irregular sparsity in the pruned networks. We present an acceleration method for CNNs, where we prune filters from CNNs that are identified as having a small effect on the output accuracy. By removing whole filters in the network together with their connecting feature maps, the computation costs are reduced significantly. In contrast to pruning weights, this approach does not result in sparse connectivity patterns. Hence, it does not need the support of sparse convolution libraries and can work with existing efficient BLAS libraries for dense matrix multiplications. We show that even simple filter pruning techniques can reduce inference costs for VGG-16 by up to 34% and ResNet-110 by up to 38% on CIFAR10 while regaining close to the original accuracy by retraining the networks.
translated by 谷歌翻译